我编写了一个解析ASCII文件的xml解析器,但我现在需要能够读取UTF-8编码的文件。我在lex中有以下正则表达式,但它们不匹配UTF-8。我不确定我做错了什么:utf_8[\x00-\xff]*bom[\xEF\xBB\xBF]然后:bom{fprintf(stderr,"OMGISAWABOM");returnBOM;}utf_8{fprintf(stderr,"OMGISAWAUTFCHAR",yytext[0]);returnUTF_8;}我还有以下语法规则:program:UTF8''root...UTF8是:UTF8:BOM{printf("isawabom\n");}|
我有以这种方式格式化的XML数据:1,2,3,4,5,69,8,7,6,5,41,2,3,4,5,69,8,7,6,5,4我正在尝试使用xmlstarlet将此数据解析为文本文件(以逗号分隔)。所需的输出如下所示:TimeAttribute,ChannelAttribute,Data01/01/20093:00:02AM,I,1,2,3,4,5,601/01/20093:00:02AM,II,9,8,7,6,5,401/01/20093:00:02AM,I,1,2,3,4,5,601/01/20093:00:02AM,II,9,8,7,6,5,4我能想到的最好的是:xmlstarlet